This project will serve as an anthology for all superivised learning algoithms we have covered throughout the course of this semester in Data Analytics. Multiple ML algaoritms/Deep Neural Nets will be deployed and tuned/experimented with in order to deal with the binary classification problem of diagnosing a patient with a certain, unnspecified disease. The primary focus of this assignment will be to tune and compare more advanced algoritms such as multilayer-perceptron ANN's to simpler, less computationally expensive shallow algorithms that we have dealt with in previous assignments.
Training Data Attributes:
import pandas as pd #importing some necessary packages for data preprocessing and visualization
import numpy as np
import pandas.util.testing as tm
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn import preprocessing
df = pd.read_csv('Disease Prediction Training.csv')
df.describe()
/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead. This is separate from the ipykernel package so we can avoid doing imports until
| Age | Height | Weight | Low Blood Pressure | High Blood Pressure | Smoke | Alcohol | Exercise | Disease | |
|---|---|---|---|---|---|---|---|---|---|
| count | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 |
| mean | 52.853306 | 164.366878 | 74.190527 | 128.698939 | 96.917367 | 0.088265 | 0.054245 | 0.803204 | 0.499959 |
| std | 6.763065 | 8.216637 | 14.329934 | 147.624582 | 200.368069 | 0.283683 | 0.226503 | 0.397581 | 0.500005 |
| min | 29.000000 | 55.000000 | 10.000000 | -150.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 48.000000 | 159.000000 | 65.000000 | 120.000000 | 80.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 50% | 53.000000 | 165.000000 | 72.000000 | 120.000000 | 80.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 75% | 58.000000 | 170.000000 | 82.000000 | 140.000000 | 90.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 |
| max | 64.000000 | 207.000000 | 200.000000 | 14020.000000 | 11000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
#Plotting the distribution of Height and Weight and our Binary categorical variables with respect to disease
var = ['Height','Weight','Smoke','Alcohol','Exercise']
df_cont = df[var]
for col in df_cont:
fig = px.histogram(df, x=col, color = "Disease")
fig.update_layout(
autosize=False,
width=800,
height=300
)
fig.show()
#Height and weight are approximately normal in distribution. Height has a large left tail which tells me that it needs to be trimmed.
#Weight is rightly skewed.
#Disease is also more common amongst females than males which is counterintuitive if this relates to Coronary disease,
#but more females were surveyed overall(31k females to ~17.2k males)
#Heavier people were most commmonly associated with disease (sensical)
#Low Blood Pressure is EXTREMELY right-skewed, would consider masking/clipping outliers
#Same thing with High blood pressure.
#ALMOST EVERY SINGLE PATIENT with IRREGULAR BP has the disease
#High/too high Glucose and cholesterol commonly leads to the disease
#The same trend is noted for Glucose levels for different individuals
#Much fewer people smoke, but a slightly higher proportion of those people end up having disease. Actually, its about 50/50 for both groups
#The same thing can be said for those who drink alchol.
#Oddly enough, patients who exercise regularly are just as susceptible to this disease as those who don't, even though far more people exercise in this study rather than those who do not.
#The disease is split exactly 50/50 which ensures that our sampling bias will be minimized, luckily.
#df.isnull().sum()
#0 missing values from any attribute
import seaborn as sns
sns.heatmap(df.corr()) #weight and age have the strongest positive correlation to disease, while smokers have the strongest negative correlation
#The aforementioned attributes additionally have a negative correlation to Disease which validates my suspicions.
<matplotlib.axes._subplots.AxesSubplot at 0x125441c90>
How often do you smoke; 1 cigarette/week, or 1 pack/day? The same thing could be said for Alcohol and Exercise.
If the patient was simply asked these question on the spot, they may have a good reason to be dishonest. Their insurance premiums may rise if they answer yes.
Alcoholism, for example could lead to liver Psoriasis which can be inferred from things such as excess Bilirupin build-up in the blood. Blood percolates through the liver, so if there's damage to it you get a high venous blood pressure which is called portal hypertension.
Ironically, alcohol raises HDL (good cholesterol), so if people only drink and don't smoke they rarely tend to get coronary disease, but unfortunately they may die of liver failure. This would make these discrete numerical attributes much more reliable in our model.
If a patients exercises 6 times/week for two hours/day, then they should have really good blood circulation but their systolic blood pressure is a bit high which means they are exerting too much energy (than for what would be considered healthy). If a patient considers moving at all as sufficient exercise than we're going to see an inconsistency in the results.
If a patient has other conditions such as diabetes, then their HbA1c (glucose levels) may be in an unhealthy range which could increase the chance of atherosclerosis thus increasing the chance of coronary disease. You can see how quickly the data collection process can become too expensive/not feasible. There are thousands of other factors/connditions(hereditary/habitual) that could be excellent predictors of cardiac disease but what if we also would like to quantify the type or severity of the disease. Heart disease is a blanket term and it could be used to refer to either the arteries (blockage/obstruction) or the heart in general. Arteriosclerosis occurs when the epi-cardial arteries can't provide enough enough oxygen to the heart. Monitor their LDL/HDL/Triglycerides/Cholesterol.
var = ['Height','Weight','High Blood Pressure','Low Blood Pressure']
df_cont = df[var]
print(df_cont.quantile(0.01, numeric_only=True))
print(df_cont.quantile(0.98, numeric_only=True))
Height 147.0 Weight 48.0 High Blood Pressure 60.0 Low Blood Pressure 90.0 Name: 0.01, dtype: float64 Height 181.0 Weight 110.0 High Blood Pressure 110.0 Low Blood Pressure 170.0 Name: 0.98, dtype: float64
# Clipping Blood Pressure to their Upper/Lower Fence Values
df["High Blood Pressure"] = df["High Blood Pressure"].clip(upper=106)#Upper Fence = 105
df["High Blood Pressure"] = df["High Blood Pressure"].clip(lower=65)# Lower Fence = 65
df["Low Blood Pressure"] = df["Low Blood Pressure"].clip(upper=171)#Upper Fence = 170
df["Low Blood Pressure"] = df["Low Blood Pressure"].clip(lower=89) #Lower Fence = 90
# Clipping Height and Weight values to their 1st Percentile
df["Height"] = df["Height"].clip(lower=147)
df["Weight"] = df["Weight"].clip(lower=49)
#Replacing outliers with a value of NAN
df.loc[df['High Blood Pressure'] == 106,'High Blood Pressure'] = np.nan
df.loc[df['High Blood Pressure'] == 65,'High Blood Pressure'] = np.nan
df.loc[df['Low Blood Pressure'] == 171,'Low Blood Pressure'] = np.nan
df.loc[df['Low Blood Pressure'] == 89,'Low Blood Pressure'] = np.nan
df.loc[df['Weight'] == 49,'Weight'] = np.nan
df.loc[df['Height'] == 147,'Height'] = np.nan
#NOTE: HERE Males are being listed as 1 and females as 0. This assignment was completely arbitrary but necessary for our models (numeric)
df["Gender"] = df["Gender"].map({'male': 1, 'female': 0})
#df = pd.get_dummies(df, columns = ['Gender','Cholesterol','Glucose'])
df["Cholesterol"] = df["Cholesterol"].map({'too high': 2, 'high': 1, 'normal':0})
df["Glucose"] = df["Glucose"].map({'too high': 2, 'high': 1, 'normal':0})
I've used multiple imputations to account for the uncertainty association between each imputation opposed to if I were to use KNearest Neighbors. Single imputation doesn't acount for that so MICE will help us find the most likely value overall.
MICE essentially uses multiple regression to map the distributions of each value in order to accurately impute what is missing. I've found that this method is very flexible and much less time consuming than if I were to use KNearest Neighbors.
Although Low/High BP constitute a large portion of outliers in our data set, they do show a strong correlation to disease. I initially tried dropping outliers from our models, however recall was notably increased after I decided to use this imputation measure instead
from fancyimpute import IterativeImputer
MICE_imputer = IterativeImputer()
df_MICE = df
df_MICE.iloc[:,:] = MICE_imputer.fit_transform(df)
Using TensorFlow backend.
df = df_MICE
df.describe() #cleaned dataframe
| Age | Gender | Height | Weight | Low Blood Pressure | High Blood Pressure | Cholesterol | Glucose | Smoke | Alcohol | Exercise | Disease | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 | 49000.000000 |
| mean | 52.853306 | 0.349735 | 164.641584 | 74.519477 | 126.429702 | 81.794306 | 0.366184 | 0.225898 | 0.088265 | 0.054245 | 0.803204 | 0.499959 |
| std | 6.763065 | 0.476891 | 7.550732 | 13.918317 | 15.399712 | 7.985393 | 0.679301 | 0.571623 | 0.283683 | 0.226503 | 0.397581 | 0.500005 |
| min | 29.000000 | 0.000000 | 148.000000 | 50.000000 | 90.000000 | 66.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 48.000000 | 0.000000 | 159.000000 | 65.000000 | 120.000000 | 80.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 50% | 53.000000 | 0.000000 | 165.000000 | 72.000000 | 120.000000 | 80.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 75% | 58.000000 | 1.000000 | 170.000000 | 82.000000 | 140.000000 | 90.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 |
| max | 64.000000 | 1.000000 | 207.000000 | 200.000000 | 170.000000 | 105.000000 | 2.000000 | 2.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
plt.subplots(figsize=(15,6))
df[['Age','Height','Weight','Low Blood Pressure','High Blood Pressure']].boxplot(patch_artist=True,sym="k.")
<matplotlib.axes._subplots.AxesSubplot at 0x141e569d0>
#df.to_csv('df_preprocessed.csv', index=False) for google collab
#Importing necessary packages for Our initial models
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, ShuffleSplit
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
We would like to maximize recall in this context because it is better to predict that a patient has this disease when they're healthy than to throw a FalseNegative when they indeed do have it. This could lead to people missing the critical time window for proper care and should be considered a life or death situation.
In the case of ANN's we will only be tuning the model on mminimum values from the binary cross entropy loss functions, which are paired with model accuracy. These models(MLPs)tend to show more balanced results across each metric which is why the ANN predictions will still be ranked by order of recall.
scaler = MinMaxScaler()
scaler.fit(df.drop('Disease',axis=1))
scaled_features = scaler.transform(df.drop('Disease',axis=1))
X_train, X_test, y_train, y_test = train_test_split(scaled_features,df['Disease'],
test_size=0.30, random_state=42)
print(X_train.shape)
print(y_train.shape)
(34300, 11) (34300,)
import warnings
from sklearn.exceptions import ConvergenceWarning
warnings.filterwarnings("ignore", category=ConvergenceWarning)
#logr_clf = make_pipeline(MinMaxScaler(),LogisticRegression()) #MinMaxScaler())
logr_clf = LogisticRegression(solver='lbfgs', random_state=42)
logr_clf.fit(X_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=42, solver='lbfgs', tol=0.0001, verbose=0,
warm_start=False)
y_lr = logr_clf.predict(X_test)
#confusion_matrix = pd.crosstab(y_test, y_lr, rownames=['Actual'], colnames=['lricted'])
from sklearn import metrics
print(f"Precision: {round(metrics.precision_score(y_test,y_lr)*100,3)}%")
print(f"Recall: {round(metrics.recall_score(y_test,y_lr)*100,3)}%")
print(f"Accuracy: {round(metrics.accuracy_score(y_test,y_lr)*100,3)}%")
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,y_lr))
print(classification_report(y_test,y_lr))
Precision: 76.104%
Recall: 67.771%
Accuracy: 72.973%
[[5695 1580]
[2393 5032]]
precision recall f1-score support
0.0 0.70 0.78 0.74 7275
1.0 0.76 0.68 0.72 7425
accuracy 0.73 14700
macro avg 0.73 0.73 0.73 14700
weighted avg 0.73 0.73 0.73 14700
‘liblinear’ only supports 'l2' fit_interceptbool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. class_weightdict or ‘balanced’, default=None multi_class{‘auto’, ‘ovr’, ‘multinomial’}, default=’auto’ If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. This makes liblinear even more palatable in the context of our model tuning/
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
param_grid = {'C':[0.001,0.1,1],
'solver':['liblinear'],
'penalty':['l2']
}#'max_iter':[1000]}
grid = GridSearchCV(logr_clf, param_grid, cv=10, scoring='recall')
grid.fit(X_train, y_train)
GridSearchCV(cv=10, error_score='raise-deprecating',
estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
fit_intercept=True,
intercept_scaling=1, l1_ratio=None,
max_iter=100, multi_class='warn',
n_jobs=None, penalty='l2',
random_state=42, solver='lbfgs',
tol=0.0001, verbose=0,
warm_start=False),
iid='warn', n_jobs=None,
param_grid={'C': [0.001, 0.1, 1], 'penalty': ['l2'],
'solver': ['liblinear']},
pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
scoring='recall', verbose=0)
bestLR = grid.best_estimator_
bestLR.fit(X_train,y_train)
print(grid.best_params_)
y_lr = bestLR.predict(X_test)
print(confusion_matrix(y_test,y_lr))
print('\n')
print(classification_report(y_test,y_lr))
print(f"Precision: {round(metrics.precision_score(y_test,y_lr)*100,3)}%")
print(f"Recall: {round(metrics.recall_score(y_test,y_lr)*100,3)}%")
print(f"Accuracy: {round(metrics.accuracy_score(y_test,y_lr)*100,3)}%")
{'C': 0.001, 'penalty': 'l2', 'solver': 'liblinear'}
[[5148 2127]
[2192 5233]]
precision recall f1-score support
0.0 0.70 0.71 0.70 7275
1.0 0.71 0.70 0.71 7425
accuracy 0.71 14700
macro avg 0.71 0.71 0.71 14700
weighted avg 0.71 0.71 0.71 14700
Precision: 71.101%
Recall: 70.478%
Accuracy: 70.619%
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt
def roc_curve_plot(clf, title, label):
# generate a no skill prediction (majority class)
ns_probs = [0 for _ in range(len(y_test))]
# predict probabilities
lr_probs = clf.predict_proba(X_test)
# keep probabilities for the positive outcome only
lr_probs = lr_probs[:, 1]
# calculate scores
ns_auc = roc_auc_score(y_test, ns_probs)
lr_auc = roc_auc_score(y_test, lr_probs)
# summarize scores
#print('No Skill: ROC AUC=%.2f' % (ns_auc))
print(f"ROC AUC: {round((lr_auc)*100,2)}%")
# calculate roc curves
ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)
lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)
# plot the roc curve for the model
plt.figure(figsize=(12,8))
plt.title(title)
plt.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
plt.plot(lr_fpr, lr_tpr, marker='.', label=label)
# axis labels
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
# show the legend
plt.legend()
# show the plot
plt.show()
roc_curve_plot(bestLR, 'Logistic Regression', 'LR')
# our area under the curve indicates that this model is much more accurate than if you were to guess
#it is a bit jagged between 0.4-0.6 for the TPR which reflects how we were. trying to tarin this model
ROC AUC: 77.23%
Now we will explore Artificial Neural Network's with up to 2 hidden layers. Note that each model has been GridSearched in google collab in order to obtain the satisfactory hyperparameters.
#Neural nets can face the problem of plateaus.
#Patience = 15 means the model waits 15 epochs to check if
#val_loss decreases further before stopping the training process.
from keras.callbacks import EarlyStopping
e_stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=15,verbose=0, mode='auto')
# ANN0 layer neural network
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import cross_validate
#scaler = MinMaxScaler()
scaler = StandardScaler()
scaler.fit(df.drop('Disease',axis=1))
scaled_features = scaler.transform(df.drop('Disease',axis=1))
X_train, X_test, y_train, y_test = train_test_split(scaled_features,df['Disease'],
test_size=0.30, random_state=42)
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
ANN0 = Sequential()
# Adding the output layer
ANN0.add(Dense(output_dim = 1, activation = 'sigmoid',input_dim=11))
# Compiling the ANN
ANN0.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
ANN0.fit(X_train, y_train, validation_data=(X_test,y_test), batch_size = 100, epochs = 27, callbacks=[e_stop])
#Making the predictions and evaluating the model
# Predicting the Test set results
y_pred_ANN0 = ANN0.predict(X_test)
y_pred_ANN0 = (y_pred_ANN0 > 0.5)
/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:15: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="sigmoid", input_dim=11, units=1)`
Train on 34300 samples, validate on 14700 samples Epoch 1/27 34300/34300 [==============================] - 0s 14us/step - loss: 0.8653 - accuracy: 0.5086 - val_loss: 0.7547 - val_accuracy: 0.5561 Epoch 2/27 34300/34300 [==============================] - 0s 10us/step - loss: 0.6915 - accuracy: 0.6003 - val_loss: 0.6451 - val_accuracy: 0.6327 Epoch 3/27 34300/34300 [==============================] - 0s 11us/step - loss: 0.6145 - accuracy: 0.6636 - val_loss: 0.5951 - val_accuracy: 0.6854 Epoch 4/27 34300/34300 [==============================] - 0s 14us/step - loss: 0.5804 - accuracy: 0.7018 - val_loss: 0.5737 - val_accuracy: 0.7101 Epoch 5/27 34300/34300 [==============================] - 1s 16us/step - loss: 0.5664 - accuracy: 0.7177 - val_loss: 0.5647 - val_accuracy: 0.7214 Epoch 6/27 34300/34300 [==============================] - 0s 11us/step - loss: 0.5606 - accuracy: 0.7226 - val_loss: 0.5607 - val_accuracy: 0.7274 Epoch 7/27 34300/34300 [==============================] - 0s 10us/step - loss: 0.5580 - accuracy: 0.7258 - val_loss: 0.5586 - val_accuracy: 0.7288 Epoch 8/27 34300/34300 [==============================] - 0s 10us/step - loss: 0.5565 - accuracy: 0.7279 - val_loss: 0.5572 - val_accuracy: 0.7294 Epoch 9/27 34300/34300 [==============================] - 0s 13us/step - loss: 0.5555 - accuracy: 0.7293 - val_loss: 0.5562 - val_accuracy: 0.7303 Epoch 10/27 34300/34300 [==============================] - 1s 15us/step - loss: 0.5548 - accuracy: 0.7294 - val_loss: 0.5557 - val_accuracy: 0.7302 Epoch 11/27 34300/34300 [==============================] - 0s 11us/step - loss: 0.5543 - accuracy: 0.7300 - val_loss: 0.5552 - val_accuracy: 0.7303 Epoch 12/27 34300/34300 [==============================] - 0s 13us/step - loss: 0.5540 - accuracy: 0.7301 - val_loss: 0.5549 - val_accuracy: 0.7300 Epoch 13/27 34300/34300 [==============================] - 0s 14us/step - loss: 0.5538 - accuracy: 0.7304 - val_loss: 0.5546 - val_accuracy: 0.7309 Epoch 14/27 34300/34300 [==============================] - 0s 11us/step - loss: 0.5536 - accuracy: 0.7302 - val_loss: 0.5544 - val_accuracy: 0.7306 Epoch 15/27 34300/34300 [==============================] - 0s 13us/step - loss: 0.5536 - accuracy: 0.7312 - val_loss: 0.5544 - val_accuracy: 0.7305 Epoch 16/27 34300/34300 [==============================] - 0s 13us/step - loss: 0.5536 - accuracy: 0.7312 - val_loss: 0.5544 - val_accuracy: 0.7301 Epoch 17/27 34300/34300 [==============================] - 0s 12us/step - loss: 0.5535 - accuracy: 0.7312 - val_loss: 0.5544 - val_accuracy: 0.7303 Epoch 18/27 34300/34300 [==============================] - 0s 11us/step - loss: 0.5535 - accuracy: 0.7310 - val_loss: 0.5543 - val_accuracy: 0.7307 Epoch 19/27 34300/34300 [==============================] - 0s 13us/step - loss: 0.5535 - accuracy: 0.7317 - val_loss: 0.5543 - val_accuracy: 0.7297 Epoch 20/27 34300/34300 [==============================] - 0s 11us/step - loss: 0.5535 - accuracy: 0.7314 - val_loss: 0.5542 - val_accuracy: 0.7307 Epoch 21/27 34300/34300 [==============================] - 0s 12us/step - loss: 0.5535 - accuracy: 0.7311 - val_loss: 0.5543 - val_accuracy: 0.7297 Epoch 22/27 34300/34300 [==============================] - 0s 11us/step - loss: 0.5535 - accuracy: 0.7310 - val_loss: 0.5544 - val_accuracy: 0.7293 Epoch 23/27 34300/34300 [==============================] - 0s 12us/step - loss: 0.5535 - accuracy: 0.7313 - val_loss: 0.5544 - val_accuracy: 0.7293 Epoch 24/27 34300/34300 [==============================] - 0s 11us/step - loss: 0.5535 - accuracy: 0.7313 - val_loss: 0.5543 - val_accuracy: 0.7304 Epoch 25/27 34300/34300 [==============================] - 1s 16us/step - loss: 0.5535 - accuracy: 0.7312 - val_loss: 0.5543 - val_accuracy: 0.7301 Epoch 26/27 34300/34300 [==============================] - 0s 12us/step - loss: 0.5535 - accuracy: 0.7311 - val_loss: 0.5543 - val_accuracy: 0.7297 Epoch 27/27 34300/34300 [==============================] - 0s 11us/step - loss: 0.5535 - accuracy: 0.7315 - val_loss: 0.5543 - val_accuracy: 0.7311
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
print(f"Precision: {round(metrics.precision_score(y_test,y_pred_ANN0)*100,3)}%")
print(f"Recall: {round(metrics.recall_score(y_test,y_pred_ANN0)*100,3)}%")
print(f"Accuracy: {round(metrics.accuracy_score(y_test, y_pred_ANN0)*100,3)}%")
print(confusion_matrix(y_test,y_pred_ANN0))
print(classification_report(y_test,y_pred_ANN0))
Precision: 76.279%
Recall: 67.865%
Accuracy: 73.109%
[[5708 1567]
[2386 5039]]
precision recall f1-score support
0.0 0.71 0.78 0.74 7275
1.0 0.76 0.68 0.72 7425
accuracy 0.73 14700
macro avg 0.73 0.73 0.73 14700
weighted avg 0.73 0.73 0.73 14700
def roc_curve_ANN(clf, title, label):
# generate a no skill prediction (majority class)
ns_probs = [0 for _ in range(len(y_test))]
# predict probabilities
lr_probs = clf.predict_proba(X_test)
# keep probabilities for the positive outcome only
lr_probs = lr_probs[:]
# calculate scores
ns_auc = roc_auc_score(y_test, ns_probs)
lr_auc = roc_auc_score(y_test, lr_probs)
# summarize scores
#print('No Skill: ROC AUC=%.2f' % (ns_auc))
print(f"ROC AUC: {round((lr_auc)*100,2)}%")
# calculate roc curves
ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)
lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)
# plot the roc curve for the model
plt.figure(figsize=(12,8))
plt.title(title)
plt.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
plt.plot(lr_fpr, lr_tpr, marker='.', label=label)
# axis labels
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
# show the legend
plt.legend()
# show the plot
plt.show()
roc_curve_ANN(ANN0,"Artificial Neural Network w/ 0 Hidden Layers","ANN0")
ROC AUC: 79.55%
The early stopping technique helped us in determining where loss of the model was minimized. The patience of 10 epochs allowed the model to keep exploring the training data to determine if the loss of the model could change further(local minima).
This model seemed to bottom out at 0.5535 loss on the training data and 0.5545 on the validation loss indicating that this fit nicely into our validation data. If our val_loss was less than our loss, this would be an issue.
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
#Initialising the ANN
ANN1 = Sequential()
# Adding the input layer and the first hidden layer
ANN1.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))
# Adding the output layer
ANN1.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
ANN1.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
ANN1.fit(X_train, y_train, validation_data=(X_test,y_test), batch_size = 88, epochs = 120) #callbacks = [e_stop])
# Predicting the Test set results
y_pred_ANN1 = ANN1.predict(X_test)
y_pred_ANN1 = (y_pred_ANN1 > 0.5)
/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="relu", input_dim=11, units=6, kernel_initializer="uniform")` /opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:6: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="sigmoid", units=1, kernel_initializer="uniform")`
Train on 34300 samples, validate on 14700 samples Epoch 1/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.6097 - accuracy: 0.6885 - val_loss: 0.5550 - val_accuracy: 0.7280 Epoch 2/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5502 - accuracy: 0.7276 - val_loss: 0.5484 - val_accuracy: 0.7299 Epoch 3/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5467 - accuracy: 0.7305 - val_loss: 0.5466 - val_accuracy: 0.7304 Epoch 4/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5454 - accuracy: 0.7322 - val_loss: 0.5463 - val_accuracy: 0.7314 Epoch 5/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5451 - accuracy: 0.7326 - val_loss: 0.5457 - val_accuracy: 0.7313 Epoch 6/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5449 - accuracy: 0.7331 - val_loss: 0.5454 - val_accuracy: 0.7313 Epoch 7/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5448 - accuracy: 0.7331 - val_loss: 0.5454 - val_accuracy: 0.7318 Epoch 8/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5447 - accuracy: 0.7336 - val_loss: 0.5450 - val_accuracy: 0.7324 Epoch 9/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5445 - accuracy: 0.7333 - val_loss: 0.5451 - val_accuracy: 0.7320 Epoch 10/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5444 - accuracy: 0.7331 - val_loss: 0.5450 - val_accuracy: 0.7319 Epoch 11/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5443 - accuracy: 0.7332 - val_loss: 0.5446 - val_accuracy: 0.7323 Epoch 12/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5442 - accuracy: 0.7338 - val_loss: 0.5448 - val_accuracy: 0.7321 Epoch 13/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5442 - accuracy: 0.7332 - val_loss: 0.5447 - val_accuracy: 0.7323 Epoch 14/120 34300/34300 [==============================] - 1s 18us/step - loss: 0.5441 - accuracy: 0.7331 - val_loss: 0.5445 - val_accuracy: 0.7318 Epoch 15/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5440 - accuracy: 0.7336 - val_loss: 0.5445 - val_accuracy: 0.7328 Epoch 16/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5440 - accuracy: 0.7338 - val_loss: 0.5445 - val_accuracy: 0.7334 Epoch 17/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5440 - accuracy: 0.7336 - val_loss: 0.5448 - val_accuracy: 0.7316 Epoch 18/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5440 - accuracy: 0.7335 - val_loss: 0.5444 - val_accuracy: 0.7331 Epoch 19/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5439 - accuracy: 0.7339 - val_loss: 0.5445 - val_accuracy: 0.7316 Epoch 20/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5438 - accuracy: 0.7335 - val_loss: 0.5446 - val_accuracy: 0.7338 Epoch 21/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5439 - accuracy: 0.7337 - val_loss: 0.5443 - val_accuracy: 0.7322 Epoch 22/120 34300/34300 [==============================] - 1s 19us/step - loss: 0.5438 - accuracy: 0.7334 - val_loss: 0.5445 - val_accuracy: 0.7318 Epoch 23/120 34300/34300 [==============================] - 1s 22us/step - loss: 0.5438 - accuracy: 0.7340 - val_loss: 0.5442 - val_accuracy: 0.7325 Epoch 24/120 34300/34300 [==============================] - 1s 24us/step - loss: 0.5437 - accuracy: 0.7343 - val_loss: 0.5443 - val_accuracy: 0.7331 Epoch 25/120 34300/34300 [==============================] - 1s 21us/step - loss: 0.5437 - accuracy: 0.7340 - val_loss: 0.5443 - val_accuracy: 0.7341 Epoch 26/120 34300/34300 [==============================] - 1s 22us/step - loss: 0.5436 - accuracy: 0.7342 - val_loss: 0.5442 - val_accuracy: 0.7335 Epoch 27/120 34300/34300 [==============================] - 1s 20us/step - loss: 0.5436 - accuracy: 0.7344 - val_loss: 0.5443 - val_accuracy: 0.7327 Epoch 28/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5436 - accuracy: 0.7350 - val_loss: 0.5440 - val_accuracy: 0.7339 Epoch 29/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5435 - accuracy: 0.7341 - val_loss: 0.5444 - val_accuracy: 0.7331 Epoch 30/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5434 - accuracy: 0.7341 - val_loss: 0.5440 - val_accuracy: 0.7332 Epoch 31/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5435 - accuracy: 0.7341 - val_loss: 0.5439 - val_accuracy: 0.7341 Epoch 32/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5434 - accuracy: 0.7348 - val_loss: 0.5443 - val_accuracy: 0.7338 Epoch 33/120 34300/34300 [==============================] - 1s 17us/step - loss: 0.5434 - accuracy: 0.7337 - val_loss: 0.5445 - val_accuracy: 0.7331 Epoch 34/120 34300/34300 [==============================] - 1s 20us/step - loss: 0.5433 - accuracy: 0.7343 - val_loss: 0.5442 - val_accuracy: 0.7339 Epoch 35/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5432 - accuracy: 0.7353 - val_loss: 0.5441 - val_accuracy: 0.7339 Epoch 36/120 34300/34300 [==============================] - 1s 17us/step - loss: 0.5433 - accuracy: 0.7343 - val_loss: 0.5442 - val_accuracy: 0.7342 Epoch 37/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5433 - accuracy: 0.7338 - val_loss: 0.5439 - val_accuracy: 0.7337 Epoch 38/120 34300/34300 [==============================] - 1s 22us/step - loss: 0.5432 - accuracy: 0.7352 - val_loss: 0.5439 - val_accuracy: 0.7341 Epoch 39/120 34300/34300 [==============================] - 1s 21us/step - loss: 0.5432 - accuracy: 0.7345 - val_loss: 0.5440 - val_accuracy: 0.7341 Epoch 40/120 34300/34300 [==============================] - 1s 21us/step - loss: 0.5432 - accuracy: 0.7345 - val_loss: 0.5437 - val_accuracy: 0.7351 Epoch 41/120 34300/34300 [==============================] - 1s 21us/step - loss: 0.5430 - accuracy: 0.7340 - val_loss: 0.5438 - val_accuracy: 0.7338 Epoch 42/120 34300/34300 [==============================] - 1s 20us/step - loss: 0.5430 - accuracy: 0.7350 - val_loss: 0.5438 - val_accuracy: 0.7347 Epoch 43/120 34300/34300 [==============================] - 1s 26us/step - loss: 0.5430 - accuracy: 0.7345 - val_loss: 0.5439 - val_accuracy: 0.7344 Epoch 44/120 34300/34300 [==============================] - 1s 18us/step - loss: 0.5430 - accuracy: 0.7355 - val_loss: 0.5438 - val_accuracy: 0.7340 Epoch 45/120 34300/34300 [==============================] - 1s 18us/step - loss: 0.5429 - accuracy: 0.7342 - val_loss: 0.5440 - val_accuracy: 0.7337 Epoch 46/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5428 - accuracy: 0.7347 - val_loss: 0.5436 - val_accuracy: 0.7352 Epoch 47/120 34300/34300 [==============================] - 1s 21us/step - loss: 0.5429 - accuracy: 0.7350 - val_loss: 0.5438 - val_accuracy: 0.7339 Epoch 48/120 34300/34300 [==============================] - 1s 21us/step - loss: 0.5428 - accuracy: 0.7351 - val_loss: 0.5437 - val_accuracy: 0.7346 Epoch 49/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5428 - accuracy: 0.7345 - val_loss: 0.5437 - val_accuracy: 0.7348 Epoch 50/120 34300/34300 [==============================] - 0s 12us/step - loss: 0.5427 - accuracy: 0.7348 - val_loss: 0.5437 - val_accuracy: 0.7337 Epoch 51/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5427 - accuracy: 0.7352 - val_loss: 0.5434 - val_accuracy: 0.7342 Epoch 52/120 34300/34300 [==============================] - 1s 20us/step - loss: 0.5427 - accuracy: 0.7349 - val_loss: 0.5437 - val_accuracy: 0.7341 Epoch 53/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5428 - accuracy: 0.7343 - val_loss: 0.5435 - val_accuracy: 0.7351 Epoch 54/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5427 - accuracy: 0.7351 - val_loss: 0.5437 - val_accuracy: 0.7348 Epoch 55/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5427 - accuracy: 0.7350 - val_loss: 0.5435 - val_accuracy: 0.7345 Epoch 56/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5427 - accuracy: 0.7343 - val_loss: 0.5435 - val_accuracy: 0.7340 Epoch 57/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5427 - accuracy: 0.7351 - val_loss: 0.5434 - val_accuracy: 0.7355 Epoch 58/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5427 - accuracy: 0.7349 - val_loss: 0.5435 - val_accuracy: 0.7346 Epoch 59/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5426 - accuracy: 0.7343 - val_loss: 0.5434 - val_accuracy: 0.7345 Epoch 60/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5427 - accuracy: 0.7343 - val_loss: 0.5435 - val_accuracy: 0.7342 Epoch 61/120 34300/34300 [==============================] - 0s 12us/step - loss: 0.5425 - accuracy: 0.7354 - val_loss: 0.5433 - val_accuracy: 0.7351 Epoch 62/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5426 - accuracy: 0.7349 - val_loss: 0.5435 - val_accuracy: 0.7348 Epoch 63/120 34300/34300 [==============================] - 1s 18us/step - loss: 0.5426 - accuracy: 0.7348 - val_loss: 0.5434 - val_accuracy: 0.7352 Epoch 64/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5426 - accuracy: 0.7346 - val_loss: 0.5435 - val_accuracy: 0.7352 Epoch 65/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5425 - accuracy: 0.7347 - val_loss: 0.5433 - val_accuracy: 0.7340 Epoch 66/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5425 - accuracy: 0.7344 - val_loss: 0.5438 - val_accuracy: 0.7341 Epoch 67/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5426 - accuracy: 0.7349 - val_loss: 0.5432 - val_accuracy: 0.7350 Epoch 68/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5425 - accuracy: 0.7348 - val_loss: 0.5434 - val_accuracy: 0.7346 Epoch 69/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5424 - accuracy: 0.7346 - val_loss: 0.5434 - val_accuracy: 0.7342 Epoch 70/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5424 - accuracy: 0.7351 - val_loss: 0.5434 - val_accuracy: 0.7349 Epoch 71/120 34300/34300 [==============================] - 1s 17us/step - loss: 0.5425 - accuracy: 0.7348 - val_loss: 0.5434 - val_accuracy: 0.7359 Epoch 72/120 34300/34300 [==============================] - 1s 17us/step - loss: 0.5425 - accuracy: 0.7351 - val_loss: 0.5433 - val_accuracy: 0.7347 Epoch 73/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5424 - accuracy: 0.7355 - val_loss: 0.5432 - val_accuracy: 0.7352 Epoch 74/120 34300/34300 [==============================] - 1s 19us/step - loss: 0.5425 - accuracy: 0.7346 - val_loss: 0.5433 - val_accuracy: 0.7356 Epoch 75/120 34300/34300 [==============================] - 1s 19us/step - loss: 0.5424 - accuracy: 0.7350 - val_loss: 0.5433 - val_accuracy: 0.7346 Epoch 76/120 34300/34300 [==============================] - 1s 18us/step - loss: 0.5423 - accuracy: 0.7350 - val_loss: 0.5437 - val_accuracy: 0.7336 Epoch 77/120 34300/34300 [==============================] - 1s 20us/step - loss: 0.5423 - accuracy: 0.7350 - val_loss: 0.5434 - val_accuracy: 0.7342 Epoch 78/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5424 - accuracy: 0.7347 - val_loss: 0.5431 - val_accuracy: 0.7346 Epoch 79/120 34300/34300 [==============================] - 1s 24us/step - loss: 0.5423 - accuracy: 0.7354 - val_loss: 0.5430 - val_accuracy: 0.7351 Epoch 80/120 34300/34300 [==============================] - 1s 22us/step - loss: 0.5423 - accuracy: 0.7349 - val_loss: 0.5431 - val_accuracy: 0.7352 Epoch 81/120 34300/34300 [==============================] - 1s 21us/step - loss: 0.5422 - accuracy: 0.7352 - val_loss: 0.5431 - val_accuracy: 0.7352 Epoch 82/120 34300/34300 [==============================] - 1s 19us/step - loss: 0.5424 - accuracy: 0.7351 - val_loss: 0.5433 - val_accuracy: 0.7357 Epoch 83/120 34300/34300 [==============================] - 1s 18us/step - loss: 0.5422 - accuracy: 0.7354 - val_loss: 0.5430 - val_accuracy: 0.7355 Epoch 84/120 34300/34300 [==============================] - 1s 18us/step - loss: 0.5422 - accuracy: 0.7354 - val_loss: 0.5430 - val_accuracy: 0.7355 Epoch 85/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5422 - accuracy: 0.7354 - val_loss: 0.5432 - val_accuracy: 0.7343 Epoch 86/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5421 - accuracy: 0.7346 - val_loss: 0.5431 - val_accuracy: 0.7348 Epoch 87/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5421 - accuracy: 0.7353 - val_loss: 0.5429 - val_accuracy: 0.7361 Epoch 88/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5421 - accuracy: 0.7348 - val_loss: 0.5431 - val_accuracy: 0.7347 Epoch 89/120 34300/34300 [==============================] - 1s 18us/step - loss: 0.5421 - accuracy: 0.7345 - val_loss: 0.5431 - val_accuracy: 0.7356 Epoch 90/120 34300/34300 [==============================] - 1s 19us/step - loss: 0.5420 - accuracy: 0.7356 - val_loss: 0.5431 - val_accuracy: 0.7359 Epoch 91/120 34300/34300 [==============================] - 1s 20us/step - loss: 0.5420 - accuracy: 0.7349 - val_loss: 0.5432 - val_accuracy: 0.7345 Epoch 92/120 34300/34300 [==============================] - 1s 20us/step - loss: 0.5420 - accuracy: 0.7354 - val_loss: 0.5430 - val_accuracy: 0.7356 Epoch 93/120 34300/34300 [==============================] - 1s 19us/step - loss: 0.5420 - accuracy: 0.7348 - val_loss: 0.5430 - val_accuracy: 0.7359 Epoch 94/120 34300/34300 [==============================] - 1s 21us/step - loss: 0.5420 - accuracy: 0.7349 - val_loss: 0.5428 - val_accuracy: 0.7354 Epoch 95/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5419 - accuracy: 0.7352 - val_loss: 0.5430 - val_accuracy: 0.7352 Epoch 96/120 34300/34300 [==============================] - 1s 21us/step - loss: 0.5419 - accuracy: 0.7344 - val_loss: 0.5431 - val_accuracy: 0.7347 Epoch 97/120 34300/34300 [==============================] - 1s 19us/step - loss: 0.5420 - accuracy: 0.7348 - val_loss: 0.5429 - val_accuracy: 0.7354 Epoch 98/120 34300/34300 [==============================] - 1s 17us/step - loss: 0.5420 - accuracy: 0.7354 - val_loss: 0.5429 - val_accuracy: 0.7364 Epoch 99/120 34300/34300 [==============================] - 1s 19us/step - loss: 0.5419 - accuracy: 0.7362 - val_loss: 0.5430 - val_accuracy: 0.7360 Epoch 100/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5419 - accuracy: 0.7357 - val_loss: 0.5430 - val_accuracy: 0.7352 Epoch 101/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5419 - accuracy: 0.7357 - val_loss: 0.5428 - val_accuracy: 0.7372 Epoch 102/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5419 - accuracy: 0.7354 - val_loss: 0.5428 - val_accuracy: 0.7360 Epoch 103/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5418 - accuracy: 0.7350 - val_loss: 0.5428 - val_accuracy: 0.7352 Epoch 104/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5418 - accuracy: 0.7352 - val_loss: 0.5427 - val_accuracy: 0.7359 Epoch 105/120 34300/34300 [==============================] - 0s 15us/step - loss: 0.5417 - accuracy: 0.7356 - val_loss: 0.5430 - val_accuracy: 0.7355 Epoch 106/120 34300/34300 [==============================] - 1s 20us/step - loss: 0.5418 - accuracy: 0.7346 - val_loss: 0.5431 - val_accuracy: 0.7361 Epoch 107/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5417 - accuracy: 0.7355 - val_loss: 0.5430 - val_accuracy: 0.7358 Epoch 108/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5417 - accuracy: 0.7348 - val_loss: 0.5429 - val_accuracy: 0.7352 Epoch 109/120 34300/34300 [==============================] - 0s 14us/step - loss: 0.5417 - accuracy: 0.7355 - val_loss: 0.5434 - val_accuracy: 0.7351 Epoch 110/120 34300/34300 [==============================] - 0s 12us/step - loss: 0.5418 - accuracy: 0.7350 - val_loss: 0.5430 - val_accuracy: 0.7354 Epoch 111/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5416 - accuracy: 0.7348 - val_loss: 0.5435 - val_accuracy: 0.7352 Epoch 112/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5418 - accuracy: 0.7348 - val_loss: 0.5429 - val_accuracy: 0.7359 Epoch 113/120 34300/34300 [==============================] - 0s 13us/step - loss: 0.5417 - accuracy: 0.7351 - val_loss: 0.5429 - val_accuracy: 0.7355 Epoch 114/120 34300/34300 [==============================] - 1s 18us/step - loss: 0.5417 - accuracy: 0.7343 - val_loss: 0.5431 - val_accuracy: 0.7365 Epoch 115/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5417 - accuracy: 0.7343 - val_loss: 0.5433 - val_accuracy: 0.7358 Epoch 116/120 34300/34300 [==============================] - 1s 16us/step - loss: 0.5416 - accuracy: 0.7354 - val_loss: 0.5430 - val_accuracy: 0.7365 Epoch 117/120 34300/34300 [==============================] - 1s 17us/step - loss: 0.5415 - accuracy: 0.7355 - val_loss: 0.5430 - val_accuracy: 0.7368 Epoch 118/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5416 - accuracy: 0.7345 - val_loss: 0.5428 - val_accuracy: 0.7356 Epoch 119/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5415 - accuracy: 0.7347 - val_loss: 0.5428 - val_accuracy: 0.7367 Epoch 120/120 34300/34300 [==============================] - 1s 15us/step - loss: 0.5416 - accuracy: 0.7345 - val_loss: 0.5429 - val_accuracy: 0.7359
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
print(f"Precision: {round(metrics.precision_score(y_test,y_pred_ANN1)*100,3)}%")
print(f"Recall: {round(metrics.recall_score(y_test,y_pred_ANN1)*100,4)}%")
print(f"Accuracy: {round(metrics.accuracy_score(y_test, y_pred_ANN1)*100,4)}%")
print(confusion_matrix(y_test,y_pred_ANN1))
print(classification_report(y_test,y_pred_ANN1))
Precision: 75.862%
Recall: 69.9663%
Accuracy: 73.585%
[[5622 1653]
[2230 5195]]
precision recall f1-score support
0.0 0.72 0.77 0.74 7275
1.0 0.76 0.70 0.73 7425
accuracy 0.74 14700
macro avg 0.74 0.74 0.74 14700
weighted avg 0.74 0.74 0.74 14700
roc_curve_ANN(ANN1, 'Artificial Neural Network w/ 1 Hidden Layer','ANN-2')
ROC AUC: 80.18%
#Initialising the ANN
ANN2 = Sequential()
# Adding the input layer and the first hidden layer
ANN2.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))
# Adding the second hidden layer
ANN2.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))
# Adding the output layer
ANN2.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
ANN2.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
ANN2.fit(X_train, y_train,validation_data=(X_test,y_test), batch_size = 42, epochs = 110) #callbacks = [e_stop])#200
# Predicting the Test set results
y_pred_ANN2 = ANN2.predict(X_test)
y_pred_ANN2 = (y_pred_ANN2 > 0.5)
/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="relu", input_dim=11, units=6, kernel_initializer="uniform")` /opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:6: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="relu", units=6, kernel_initializer="uniform")` /opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:8: UserWarning: Update your `Dense` call to the Keras 2 API: `Dense(activation="sigmoid", units=1, kernel_initializer="uniform")`
Train on 34300 samples, validate on 14700 samples Epoch 1/110 34300/34300 [==============================] - 1s 39us/step - loss: 0.5755 - accuracy: 0.7205 - val_loss: 0.5457 - val_accuracy: 0.7299 Epoch 2/110 34300/34300 [==============================] - 1s 33us/step - loss: 0.5445 - accuracy: 0.7326 - val_loss: 0.5439 - val_accuracy: 0.7325 Epoch 3/110 34300/34300 [==============================] - 1s 36us/step - loss: 0.5434 - accuracy: 0.7334 - val_loss: 0.5435 - val_accuracy: 0.7325 Epoch 4/110 34300/34300 [==============================] - 1s 32us/step - loss: 0.5428 - accuracy: 0.7343 - val_loss: 0.5425 - val_accuracy: 0.7344 Epoch 5/110 34300/34300 [==============================] - 1s 33us/step - loss: 0.5424 - accuracy: 0.7348 - val_loss: 0.5423 - val_accuracy: 0.7342 Epoch 6/110 34300/34300 [==============================] - 1s 36us/step - loss: 0.5419 - accuracy: 0.7345 - val_loss: 0.5416 - val_accuracy: 0.7339 Epoch 7/110 34300/34300 [==============================] - 1s 35us/step - loss: 0.5413 - accuracy: 0.7352 - val_loss: 0.5410 - val_accuracy: 0.7344 Epoch 8/110 34300/34300 [==============================] - 1s 32us/step - loss: 0.5411 - accuracy: 0.7347 - val_loss: 0.5406 - val_accuracy: 0.7355 Epoch 9/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5408 - accuracy: 0.7351 - val_loss: 0.5408 - val_accuracy: 0.7359 Epoch 10/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5405 - accuracy: 0.7361 - val_loss: 0.5399 - val_accuracy: 0.7351 Epoch 11/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5401 - accuracy: 0.7356 - val_loss: 0.5401 - val_accuracy: 0.7365 Epoch 12/110 34300/34300 [==============================] - 1s 33us/step - loss: 0.5398 - accuracy: 0.7347 - val_loss: 0.5395 - val_accuracy: 0.7369 Epoch 13/110 34300/34300 [==============================] - 1s 40us/step - loss: 0.5399 - accuracy: 0.7358 - val_loss: 0.5391 - val_accuracy: 0.7371 Epoch 14/110 34300/34300 [==============================] - 1s 32us/step - loss: 0.5395 - accuracy: 0.7355 - val_loss: 0.5393 - val_accuracy: 0.7366 Epoch 15/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5393 - accuracy: 0.7354 - val_loss: 0.5396 - val_accuracy: 0.7365 Epoch 16/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5392 - accuracy: 0.7356 - val_loss: 0.5404 - val_accuracy: 0.7348 Epoch 17/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5390 - accuracy: 0.7359 - val_loss: 0.5409 - val_accuracy: 0.7365 Epoch 18/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5392 - accuracy: 0.7365 - val_loss: 0.5386 - val_accuracy: 0.7374 Epoch 19/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5389 - accuracy: 0.7362 - val_loss: 0.5386 - val_accuracy: 0.7376 Epoch 20/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5389 - accuracy: 0.7366 - val_loss: 0.5391 - val_accuracy: 0.7368 Epoch 21/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5388 - accuracy: 0.7358 - val_loss: 0.5392 - val_accuracy: 0.7371 Epoch 22/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5388 - accuracy: 0.7357 - val_loss: 0.5390 - val_accuracy: 0.7363 Epoch 23/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5386 - accuracy: 0.7355 - val_loss: 0.5389 - val_accuracy: 0.7376 Epoch 24/110 34300/34300 [==============================] - 1s 29us/step - loss: 0.5387 - accuracy: 0.7366 - val_loss: 0.5389 - val_accuracy: 0.7365 Epoch 25/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5386 - accuracy: 0.7350 - val_loss: 0.5387 - val_accuracy: 0.7374 Epoch 26/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5383 - accuracy: 0.7367 - val_loss: 0.5393 - val_accuracy: 0.7366 Epoch 27/110 34300/34300 [==============================] - 1s 37us/step - loss: 0.5385 - accuracy: 0.7354 - val_loss: 0.5389 - val_accuracy: 0.7368 Epoch 28/110 34300/34300 [==============================] - 1s 38us/step - loss: 0.5385 - accuracy: 0.7359 - val_loss: 0.5385 - val_accuracy: 0.7378 Epoch 29/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5384 - accuracy: 0.7361 - val_loss: 0.5385 - val_accuracy: 0.7369 Epoch 30/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5383 - accuracy: 0.7354 - val_loss: 0.5386 - val_accuracy: 0.7369 Epoch 31/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5385 - accuracy: 0.7356 - val_loss: 0.5387 - val_accuracy: 0.7371 Epoch 32/110 34300/34300 [==============================] - 1s 26us/step - loss: 0.5384 - accuracy: 0.7362 - val_loss: 0.5388 - val_accuracy: 0.7363 Epoch 33/110 34300/34300 [==============================] - 1s 26us/step - loss: 0.5382 - accuracy: 0.7360 - val_loss: 0.5392 - val_accuracy: 0.7362 Epoch 34/110 34300/34300 [==============================] - 1s 26us/step - loss: 0.5383 - accuracy: 0.7355 - val_loss: 0.5387 - val_accuracy: 0.7366 Epoch 35/110 34300/34300 [==============================] - 1s 26us/step - loss: 0.5381 - accuracy: 0.7359 - val_loss: 0.5387 - val_accuracy: 0.7378 Epoch 36/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5382 - accuracy: 0.7368 - val_loss: 0.5385 - val_accuracy: 0.7367 Epoch 37/110 34300/34300 [==============================] - 1s 26us/step - loss: 0.5383 - accuracy: 0.7358 - val_loss: 0.5382 - val_accuracy: 0.7364 Epoch 38/110 34300/34300 [==============================] - 1s 26us/step - loss: 0.5379 - accuracy: 0.7355 - val_loss: 0.5389 - val_accuracy: 0.7356 Epoch 39/110 34300/34300 [==============================] - 1s 34us/step - loss: 0.5381 - accuracy: 0.7364 - val_loss: 0.5398 - val_accuracy: 0.7362 Epoch 40/110 34300/34300 [==============================] - 1s 39us/step - loss: 0.5381 - accuracy: 0.7356 - val_loss: 0.5389 - val_accuracy: 0.7365 Epoch 41/110 34300/34300 [==============================] - 1s 33us/step - loss: 0.5382 - accuracy: 0.7356 - val_loss: 0.5390 - val_accuracy: 0.7365 Epoch 42/110 34300/34300 [==============================] - 1s 40us/step - loss: 0.5379 - accuracy: 0.7371 - val_loss: 0.5387 - val_accuracy: 0.7376 Epoch 43/110 34300/34300 [==============================] - 1s 40us/step - loss: 0.5382 - accuracy: 0.7360 - val_loss: 0.5388 - val_accuracy: 0.7363 Epoch 44/110 34300/34300 [==============================] - 1s 35us/step - loss: 0.5381 - accuracy: 0.7360 - val_loss: 0.5388 - val_accuracy: 0.7364 Epoch 45/110 34300/34300 [==============================] - 1s 32us/step - loss: 0.5380 - accuracy: 0.7359 - val_loss: 0.5384 - val_accuracy: 0.7373 Epoch 46/110 34300/34300 [==============================] - 1s 29us/step - loss: 0.5381 - accuracy: 0.7357 - val_loss: 0.5385 - val_accuracy: 0.7386 Epoch 47/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5380 - accuracy: 0.7366 - val_loss: 0.5382 - val_accuracy: 0.7379 Epoch 48/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5380 - accuracy: 0.7364 - val_loss: 0.5383 - val_accuracy: 0.7373 Epoch 49/110 34300/34300 [==============================] - 1s 29us/step - loss: 0.5379 - accuracy: 0.7359 - val_loss: 0.5389 - val_accuracy: 0.7360 Epoch 50/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5378 - accuracy: 0.7357 - val_loss: 0.5395 - val_accuracy: 0.7377 Epoch 51/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5380 - accuracy: 0.7360 - val_loss: 0.5387 - val_accuracy: 0.7390 Epoch 52/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5380 - accuracy: 0.7366 - val_loss: 0.5382 - val_accuracy: 0.7382 Epoch 53/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5379 - accuracy: 0.7365 - val_loss: 0.5397 - val_accuracy: 0.7358 Epoch 54/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5380 - accuracy: 0.7368 - val_loss: 0.5382 - val_accuracy: 0.7369 Epoch 55/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5378 - accuracy: 0.7365 - val_loss: 0.5392 - val_accuracy: 0.7371 Epoch 56/110 34300/34300 [==============================] - 1s 26us/step - loss: 0.5380 - accuracy: 0.7368 - val_loss: 0.5385 - val_accuracy: 0.7369 Epoch 57/110 34300/34300 [==============================] - 1s 26us/step - loss: 0.5380 - accuracy: 0.7364 - val_loss: 0.5386 - val_accuracy: 0.7384 Epoch 58/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5379 - accuracy: 0.7362 - val_loss: 0.5384 - val_accuracy: 0.7382 Epoch 59/110 34300/34300 [==============================] - 1s 29us/step - loss: 0.5378 - accuracy: 0.7353 - val_loss: 0.5382 - val_accuracy: 0.7382 Epoch 60/110 34300/34300 [==============================] - 1s 25us/step - loss: 0.5379 - accuracy: 0.7364 - val_loss: 0.5384 - val_accuracy: 0.7382 Epoch 61/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5379 - accuracy: 0.7352 - val_loss: 0.5390 - val_accuracy: 0.7358 Epoch 62/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5379 - accuracy: 0.7371 - val_loss: 0.5385 - val_accuracy: 0.7373 Epoch 63/110 34300/34300 [==============================] - 1s 35us/step - loss: 0.5377 - accuracy: 0.7352 - val_loss: 0.5384 - val_accuracy: 0.7371 Epoch 64/110 34300/34300 [==============================] - 1s 33us/step - loss: 0.5377 - accuracy: 0.7356 - val_loss: 0.5384 - val_accuracy: 0.7379 Epoch 65/110 34300/34300 [==============================] - 1s 34us/step - loss: 0.5376 - accuracy: 0.7357 - val_loss: 0.5389 - val_accuracy: 0.7384 Epoch 66/110 34300/34300 [==============================] - 1s 32us/step - loss: 0.5377 - accuracy: 0.7365 - val_loss: 0.5380 - val_accuracy: 0.7382 Epoch 67/110 34300/34300 [==============================] - 1s 35us/step - loss: 0.5377 - accuracy: 0.7371 - val_loss: 0.5385 - val_accuracy: 0.7369 Epoch 68/110 34300/34300 [==============================] - 1s 32us/step - loss: 0.5377 - accuracy: 0.7358 - val_loss: 0.5387 - val_accuracy: 0.7361 Epoch 69/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5376 - accuracy: 0.7367 - val_loss: 0.5385 - val_accuracy: 0.7380 Epoch 70/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5377 - accuracy: 0.7373 - val_loss: 0.5383 - val_accuracy: 0.7374 Epoch 71/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5377 - accuracy: 0.7360 - val_loss: 0.5387 - val_accuracy: 0.7363 Epoch 72/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5378 - accuracy: 0.7353 - val_loss: 0.5384 - val_accuracy: 0.7378 Epoch 73/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5377 - accuracy: 0.7363 - val_loss: 0.5380 - val_accuracy: 0.7376 Epoch 74/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5375 - accuracy: 0.7363 - val_loss: 0.5384 - val_accuracy: 0.7369 Epoch 75/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5378 - accuracy: 0.7368 - val_loss: 0.5384 - val_accuracy: 0.7378 Epoch 76/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5377 - accuracy: 0.7365 - val_loss: 0.5382 - val_accuracy: 0.7369 Epoch 77/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5375 - accuracy: 0.7370 - val_loss: 0.5383 - val_accuracy: 0.7372 Epoch 78/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5377 - accuracy: 0.7368 - val_loss: 0.5384 - val_accuracy: 0.7378 Epoch 79/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5376 - accuracy: 0.7366 - val_loss: 0.5381 - val_accuracy: 0.7380 Epoch 80/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5377 - accuracy: 0.7358 - val_loss: 0.5383 - val_accuracy: 0.7370 Epoch 81/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5377 - accuracy: 0.7364 - val_loss: 0.5379 - val_accuracy: 0.7386 Epoch 82/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5376 - accuracy: 0.7378 - val_loss: 0.5383 - val_accuracy: 0.7366 Epoch 83/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5375 - accuracy: 0.7366 - val_loss: 0.5386 - val_accuracy: 0.7361 Epoch 84/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5377 - accuracy: 0.7363 - val_loss: 0.5381 - val_accuracy: 0.7379 Epoch 85/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5376 - accuracy: 0.7356 - val_loss: 0.5383 - val_accuracy: 0.7382 Epoch 86/110 34300/34300 [==============================] - 1s 31us/step - loss: 0.5376 - accuracy: 0.7358 - val_loss: 0.5386 - val_accuracy: 0.7381 Epoch 87/110 34300/34300 [==============================] - 1s 39us/step - loss: 0.5375 - accuracy: 0.7361 - val_loss: 0.5381 - val_accuracy: 0.7381 Epoch 88/110 34300/34300 [==============================] - 1s 37us/step - loss: 0.5375 - accuracy: 0.7366 - val_loss: 0.5395 - val_accuracy: 0.7364 Epoch 89/110 34300/34300 [==============================] - 1s 31us/step - loss: 0.5373 - accuracy: 0.7374 - val_loss: 0.5383 - val_accuracy: 0.7382 Epoch 90/110 34300/34300 [==============================] - 1s 34us/step - loss: 0.5375 - accuracy: 0.7354 - val_loss: 0.5389 - val_accuracy: 0.7360 Epoch 91/110 34300/34300 [==============================] - 1s 32us/step - loss: 0.5375 - accuracy: 0.7359 - val_loss: 0.5381 - val_accuracy: 0.7374 Epoch 92/110 34300/34300 [==============================] - 1s 30us/step - loss: 0.5375 - accuracy: 0.7365 - val_loss: 0.5384 - val_accuracy: 0.7379 Epoch 93/110 34300/34300 [==============================] - 1s 38us/step - loss: 0.5376 - accuracy: 0.7359 - val_loss: 0.5383 - val_accuracy: 0.7383 Epoch 94/110 34300/34300 [==============================] - 1s 36us/step - loss: 0.5374 - accuracy: 0.7368 - val_loss: 0.5394 - val_accuracy: 0.7364ss: 0.5386 - accura Epoch 95/110 34300/34300 [==============================] - 1s 39us/step - loss: 0.5374 - accuracy: 0.7367 - val_loss: 0.5385 - val_accuracy: 0.7372 Epoch 96/110 34300/34300 [==============================] - 1s 34us/step - loss: 0.5375 - accuracy: 0.7360 - val_loss: 0.5383 - val_accuracy: 0.7376 Epoch 97/110 34300/34300 [==============================] - 1s 38us/step - loss: 0.5373 - accuracy: 0.7363 - val_loss: 0.5390 - val_accuracy: 0.7359 Epoch 98/110 34300/34300 [==============================] - 1s 31us/step - loss: 0.5375 - accuracy: 0.7362 - val_loss: 0.5381 - val_accuracy: 0.7378 Epoch 99/110 34300/34300 [==============================] - 1s 35us/step - loss: 0.5373 - accuracy: 0.7360 - val_loss: 0.5380 - val_accuracy: 0.7388 Epoch 100/110 34300/34300 [==============================] - 1s 37us/step - loss: 0.5374 - accuracy: 0.7365 - val_loss: 0.5380 - val_accuracy: 0.7378 Epoch 101/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5375 - accuracy: 0.7357 - val_loss: 0.5381 - val_accuracy: 0.7378 Epoch 102/110 34300/34300 [==============================] - 1s 29us/step - loss: 0.5374 - accuracy: 0.7359 - val_loss: 0.5386 - val_accuracy: 0.7367 Epoch 103/110 34300/34300 [==============================] - 1s 29us/step - loss: 0.5373 - accuracy: 0.7364 - val_loss: 0.5381 - val_accuracy: 0.7378 Epoch 104/110 34300/34300 [==============================] - 1s 30us/step - loss: 0.5373 - accuracy: 0.7366 - val_loss: 0.5381 - val_accuracy: 0.7383 Epoch 105/110 34300/34300 [==============================] - 1s 29us/step - loss: 0.5375 - accuracy: 0.7361 - val_loss: 0.5382 - val_accuracy: 0.7387 Epoch 106/110 34300/34300 [==============================] - 1s 33us/step - loss: 0.5373 - accuracy: 0.7364 - val_loss: 0.5384 - val_accuracy: 0.7368 Epoch 107/110 34300/34300 [==============================] - 1s 29us/step - loss: 0.5374 - accuracy: 0.7367 - val_loss: 0.5379 - val_accuracy: 0.7382 Epoch 108/110 34300/34300 [==============================] - 1s 28us/step - loss: 0.5374 - accuracy: 0.7361 - val_loss: 0.5383 - val_accuracy: 0.7382 Epoch 109/110 34300/34300 [==============================] - 1s 27us/step - loss: 0.5372 - accuracy: 0.7368 - val_loss: 0.5386 - val_accuracy: 0.7354 Epoch 110/110 34300/34300 [==============================] - 1s 29us/step - loss: 0.5373 - accuracy: 0.7361 - val_loss: 0.5379 - val_accuracy: 0.7376
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
print(f"Precision: {round(metrics.precision_score(y_test,y_pred_ANN2)*100,3)}%")
print(f"Recall: {round(metrics.recall_score(y_test,y_pred_ANN2)*100,3)}%")
print(f"Accuracy: {round(metrics.accuracy_score(y_test,y_pred_ANN2)*100,3)}%")
print(confusion_matrix(y_test,y_pred_ANN2))
print(classification_report(y_test,y_pred_ANN2))
Precision: 75.673%
Recall: 70.801%
Accuracy: 73.755%
[[5585 1690]
[2168 5257]]
precision recall f1-score support
0.0 0.72 0.77 0.74 7275
1.0 0.76 0.71 0.73 7425
accuracy 0.74 14700
macro avg 0.74 0.74 0.74 14700
weighted avg 0.74 0.74 0.74 14700
roc_curve_ANN(ANN2, 'Artificial Neural Network w/ 2 Hidden Layers','ANN-2')
ROC AUC: 80.44%
With regards to hyperparamter tuning; the smaller the batch_size, the less accurate the estimate of the gradient seemed to be and the more time consuming the process was overall. However, as more layers are added the complexity is increased which is why smaller batch sizes were fed to the model. A larger number of epochs was required to converge on a minimum as well.
• Apply decision tree learning algorithm and fine tune the model on the disease dataset.
from sklearn.tree import DecisionTreeClassifier
d_tree = DecisionTreeClassifier(random_state=42)
d_tree.fit(X_train, y_train) #scaled
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=42, splitter='best')
y_predDT = d_tree.predict(X_test)
from sklearn import metrics
print(f"Precision: {round(metrics.precision_score(y_test,y_predDT)*100)}%")
print(f"Recall: {round(metrics.recall_score(y_test,y_predDT)*100)}%")
print(f"Accuracy: {round(metrics.accuracy_score(y_predDT,y_test)*100)}%")
print(confusion_matrix(y_test,y_predDT))
print(classification_report(y_test,y_predDT))
Precision: 65.0%
Recall: 64.0%
Accuracy: 64.0%
[[4698 2577]
[2709 4716]]
precision recall f1-score support
0.0 0.63 0.65 0.64 7275
1.0 0.65 0.64 0.64 7425
accuracy 0.64 14700
macro avg 0.64 0.64 0.64 14700
weighted avg 0.64 0.64 0.64 14700
d_tree = DecisionTreeClassifier(criterion = 'entropy',max_depth = 7,
max_leaf_nodes = 20,
min_samples_leaf = 1,
min_samples_split = 2)
d_tree.fit(X_train, y_train)
DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=7,
max_features=None, max_leaf_nodes=20,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=None, splitter='best')
y_pred = d_tree.predict(X_test)
print(f"Precision: {round(metrics.precision_score(y_test, y_pred)*100,2)}%")
print(f"Recall: {round(metrics.recall_score(y_test, y_pred)*100,2)}%")
print(f"Accuracy: {round(metrics.accuracy_score(y_test, y_pred)*100,2)}%")
print(f"ROC_AUC: {round(roc_auc_score(y_test, y_pred)*100,2)}%")
Precision: 73.78% Recall: 73.21% Accuracy: 73.33% ROC_AUC: 73.33%
This has yielded some of the most consistent and finest results across each of our models surprisingly enough. Accuracy and Recall are balanced with class and the graph below shows an AUC of 79.87% which shows us that this model has performed extremely well overall in the context of this data
print("Best Decision Tree ")
roc_curve_plot(d_tree, 'Decision Tree', 'DT')
Best Decision Tree ROC AUC: 79.87%
df_imp = df.drop('Disease', axis=1)
# RUN FEATURE IMPORTANCE
# Calculate feature importances
importances = d_tree.feature_importances_
feature_importances = pd.DataFrame(importances,
index = df_imp.columns,columns=['Feature Importance']).sort_values('Feature Importance',ascending=False)
feature_importances.head()
fig_fi = px.bar(feature_importances, y = 'Feature Importance', x = feature_importances.index, title="Feature Importance: Decision Tree")
fig_fi.show()
from plotly import graph_objects as go
fig = go.Figure(layout = {"title": "Model Performance Master Table"},
data=[go.Table(header=dict(values=["Algorithm","Key hyperparamters","Recall (key metric)","Accuracy","Estimated RunTime"]),
cells=dict(values=[["DT","ANN1","ANN2","RF","SVC-RBF","X-GBM","KNN","ANN0","SVC-Lin","GNB"],
["entropy, max_depth=7, max_leaf_nodes=20",
"adam, binary_crossentropy, batch_size:85, epochs:150",
"adam, binary_crossentropy, batch_size:42, epochs:110",
"class_weight=balanced, gini, max_depth=25, max_features=sqrt, max_leaf_nodes:20",
"{C: 2, gamma: 0.8, kernel: rbf}",
"base_score=0.5, booster=gbtree, colsample_bylevel=1, min_child_weight=1, missing=None, n_estimators=100, objective=binarylogistic",
"n_neighbors = 31, p=2",
"adam, binary_crossentropy, batch_size = 100, epochs = 27",
"C: 0.5, dual: False, loss: squared_hinge, penalty: l2",
"priors=None, var_smoothing=1e-09"
],
["0.7321","0.7235","0.7006","0.7003","0.6984","0.6948","0.6897","0.6789","0.6708","0.6315"], #Recall
["0.7333","0.7380","0.7361","0.7088","0.7354","0.7395","0.7284","0.7309","0.7299","0.7193"], #Accuracy
["1 hr","2 hrs","3 hours","2 hours","5 hours","<5min","<5min","15 min","45 min","< 1 min"]])) #Estimated Runtime
])
fig.show()
df_test= pd.read_csv('Disease Prediction Testing.csv')
df_test.describe()
| ID | Age | Height | Weight | Low Blood Pressure | High Blood Pressure | Smoke | Alcohol | Exercise | |
|---|---|---|---|---|---|---|---|---|---|
| count | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 |
| mean | 10499.500000 | 52.811190 | 164.341381 | 74.241070 | 129.093429 | 95.960857 | 0.087810 | 0.052667 | 0.804952 |
| std | 6062.322162 | 6.775489 | 8.195082 | 14.548468 | 167.975674 | 157.257409 | 0.283024 | 0.223372 | 0.396247 |
| min | 0.000000 | 29.000000 | 64.000000 | 21.000000 | 10.000000 | -70.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 5249.750000 | 48.000000 | 159.000000 | 65.000000 | 120.000000 | 80.000000 | 0.000000 | 0.000000 | 1.000000 |
| 50% | 10499.500000 | 53.000000 | 165.000000 | 72.000000 | 120.000000 | 80.000000 | 0.000000 | 0.000000 | 1.000000 |
| 75% | 15749.250000 | 58.000000 | 170.000000 | 82.000000 | 140.000000 | 90.000000 | 0.000000 | 0.000000 | 1.000000 |
| max | 20999.000000 | 64.000000 | 250.000000 | 183.000000 | 16020.000000 | 8500.000000 | 1.000000 | 1.000000 | 1.000000 |
# Clipping Blood Pressure to their Upper/Lower Fence Values
df_test["High Blood Pressure"] = df_test["High Blood Pressure"].clip(upper=106)#Upper Fence = 105
df_test["High Blood Pressure"] = df_test["High Blood Pressure"].clip(lower=65)# Lower Fence = 65
df_test["Low Blood Pressure"] = df_test["Low Blood Pressure"].clip(upper=171)#Upper Fence = 170
df_test["Low Blood Pressure"] = df_test["Low Blood Pressure"].clip(lower=89) #Lower Fence = 90
# Clipping Height and Weight values to their 1st Percentile
df_test["Height"] = df_test["Height"].clip(lower=147)
df_test["Weight"] = df_test["Weight"].clip(lower=49)
#Replacing outliers with a value of NAN
df_test.loc[df_test['High Blood Pressure'] == 106,'High Blood Pressure'] = np.nan
df_test.loc[df_test['High Blood Pressure'] == 65,'High Blood Pressure'] = np.nan
df_test.loc[df_test['Low Blood Pressure'] == 171,'Low Blood Pressure'] = np.nan
df_test.loc[df_test['Low Blood Pressure'] == 89,'Low Blood Pressure'] = np.nan
df_test.loc[df_test['Weight'] == 49,'Weight'] = np.nan
df_test.loc[df_test['Height'] == 147,'Height'] = np.nan
#NOTE: HERE Males are being listed as 1 and females as 0. This assignment was completely arbitrary but necessary for our models (numeric)
df_test["Gender"] = df_test["Gender"].map({'male': 1, 'female': 0})
#df= pd.get_dummies(df_test, columns = ['Gender','Cholesterol','Glucose'])
df_test["Cholesterol"] = df_test["Cholesterol"].map({'too high': 2, 'high': 1, 'normal':0})
df_test["Glucose"] = df_test["Glucose"].map({'too high': 2, 'high': 1, 'normal':0})
from fancyimpute import IterativeImputer
MICE_imputer = IterativeImputer()
df_test_MICE= df_test
df_test_MICE.iloc[:,:] = MICE_imputer.fit_transform(df_test)
df_test= df_test_MICE
#making our final predictions
df_test= df_test[['Age','Gender','Height','Weight','High Blood Pressure','Low Blood Pressure','Cholesterol','Glucose','Smoke','Alcohol','Exercise']]
df_test.describe()
| Age | Gender | Height | Weight | High Blood Pressure | Low Blood Pressure | Cholesterol | Glucose | Smoke | Alcohol | Exercise | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 | 21000.000000 |
| mean | 52.811190 | 0.349190 | 164.623165 | 74.530922 | 81.748464 | 126.366216 | 0.368476 | 0.227762 | 0.087810 | 0.052667 | 0.804952 |
| std | 6.775489 | 0.476726 | 7.586457 | 14.198626 | 7.883681 | 15.503019 | 0.682474 | 0.573790 | 0.283024 | 0.223372 | 0.396247 |
| min | 29.000000 | 0.000000 | 148.000000 | 50.000000 | 66.000000 | 90.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 48.000000 | 0.000000 | 159.000000 | 65.000000 | 80.000000 | 120.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| 50% | 53.000000 | 0.000000 | 165.000000 | 72.000000 | 80.000000 | 120.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| 75% | 58.000000 | 1.000000 | 170.000000 | 82.000000 | 90.000000 | 140.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| max | 64.000000 | 1.000000 | 250.000000 | 183.000000 | 105.000000 | 170.000000 | 2.000000 | 2.000000 | 1.000000 | 1.000000 | 1.000000 |
scaler = MinMaxScaler()
scaler.fit(df_test)
scaled_test = scaler.transform(df_test)
#scaling the remaining testing data to between 0 and 1
y_LR_prediction = bestLR.predict(scaled_test)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df_test) #scaling the remaining testing data to between -1 and 1
scaled_test = scaler.transform(df_test)
# Predicting labels for 4 remaining models
y_DT_prediction = d_tree.predict(scaled_test)
y_ANN0 = ANN0.predict_classes(scaled_test)
y_ANN1 = ANN1.predict_classes(scaled_test)
y_ANN2= ANN2.predict_classes(scaled_test)
#Outputing predictions to final dataframe for evaluation.
df_output = pd.read_csv('Disease Prediction Testing.csv')
Final = pd.DataFrame(list(zip(df_output['ID'], y_DT_prediction,y_LR_prediction,
y_ANN0,y_ANN1,y_ANN2)),
columns=['ID','DT','LR','ANN0','ANN1','ANN2'])
Final["ANN0"] = Final.ANN0.astype(float)
Final["ANN1"] = Final.ANN1.astype(float)
Final["ANN2"] = Final.ANN2.astype(float)
Final.head()
| ID | DT | LR | ANN0 | ANN1 | ANN2 | |
|---|---|---|---|---|---|---|
| 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2 | 2 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| 3 | 3 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| 4 | 4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Final.to_csv('HW4_pred_Ondocin.csv', index=False)